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Abstract 

Natural  or  artificial  vision  systems  process  the  images  that  they  collect  with  their  eyes  or  cam¬ 
eras  in  order  to  derive  information  for  performing  tasks  related  to  navigation  and  recognition. 
Since  the  way  images  are  acquired  determines  how  difficult  it  is  to  perform  a  visual  task,  and 
since  systems  have  to  cope  with  limited  resources,  the  eyes  used  by  a  specific  system  should 
be  designed  to  optimize  subsequent  image  processing  as  it  relates  to  particular  tasks.  Differ¬ 
ent  ways  of  sampling  light,  i.e.,  different  eyes,  may  be  less  or  more  powerful  with  respect  to 
particular  competences.  This  seems  intuitively  evident  in  view  of  the  variety  of  eye  designs  in 
the  biological  world.  It  is  shown  here  that  a  spherical  eye  (an  eye  or  system  of  eyes  providing 
panoramic  vision)  is  superior  to  a  camera-type  eye  (an  eye  with  restricted  field  of  view)  as 
regards  the  competence  of  three-dimensional  motion  estimation.  This  result  is  derived  from 
a  statistical  analysis  of  all  the  possible  computational  models  that  can  be  used  for  estimating 
3D  motion  from  an  image  sequence.  The  findings  explain  biological  design  in  a  mathematical 
manner,  by  showing  that  systems  that  fly  and  thus  need  good  estimates  of  3D  motion  gain 
advantages  from  panoramic  vision.  Also,  insights  obtained  from  this  study  point  to  new  ways 
of  constructing  powerful  imaging  devices  that  suit  particular  tasks  in  robotics,  visualization 
and  virtual  reality  better  than  conventional  cameras,  thus  leading  to  a  new  camera  technology. 


Special  thanks  to  Sara  Litison  for  her  editorial  and  graphics  assistance.  The  support  of  the  Office  of  Naval 
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When  classifying  eye  designs  in  biological  systems,  one  can  differentiate  between  the  different 
ways  of  gathering  light  at  the  retina,  whether  single  or  multiple  lenses  are  used,  the  spatial 
distribution  of  the  photoreceptors,  the  shapes  of  the  imaging  surfaces,  and  what  geometrical  and 
physical  properties  of  light  are  measured  (frequency,  polaxization).  A  landscape  of  eye  evolution 
is  provided  by  Michael  Land  in  [3].  Considering  evolution  as  a  mountain,  with  the  lower  hills 
representing  earlier  steps  in  the  evolutionary  ladder,  and  the  highest  peaks  representing  later 
stages  of  evolution,  the  situation  is  pictured  in  Figure  1.  At  the  higher  levels  of  evolution  one 
finds  the  compound  eyes  of  insects  and  crustaceans  and  the  camera- type  eyes  such  as  the  corneal 
eyes  of  land  vertebrates  and  fish.  These  two  categories  constitute  two  fundamentally  different 
designs.  Fundamental  differences  also  arise  from  the  positions  in  the  head  where  camera-type 
eyes  are  placed,  for  example,  dose  to  each  other  as  in  humans  and  primates,  or  on  opposite 
sides  of  the  head  as  in  birds  and  fish,  providing  panoramic  vision.  It  appears  that  the  eyes  of 
an  organism  evolve  in  a  way  that  best  serves  that  organism  in  carrying  out  its  tasks.  Thus, 
the  success  of  an  eye  design  should  not  be  judged  in  an  anthropicanic  manner,  i.e.,  by  how 
accurately  it  forms  an  image  of  the  outside  world;  rather,  it  should  be  judged  in  a  purposive 
sense.  A  successful  eye  design  is  one  that  makes  the  performance  of  the  visual  tasks  a  system  is 
confronted  with  as  easy  as  possible  (fast  and  robust)  [18].  The  discovery  of  principles  relating 
eye  design  to  system  behavior  will  shed  light  on  the  problem  of  evolution  in  general,  and  on 
the  structure  and  function  of  the  brain  in  particular.  At  the  same  time,  it  will  contribute  to 
the  development  of  alternative  camera  technologies;  cameras  replace  eyes  in  artificial  systems 
and  different  camera  designs  will  be  more  or  less  appropriate  for  different  tasks.  Cameras  used 
in  alarm  systems,  inspection  processes,  virtual  reality  systems  and  human  augmentation  tasks 
need  not  be  the  same;  they  should  be  designed  to  facilitate  the  tasks  at  hand.  This  paper 
represents  a  first  effort  to  introduce  structure  into  the  landscape  of  eyes  as  it  relates  to  tasks 
that  systems  perform. 

Compound  Eyes  Camera-Type  Eyes 


Figure  1:  Michael  Land’s  landscape  of  eye  evolution  (from  [3]). 

Although  the  space  of  tasks  or  behaviors  performed  by  vision  systems  is  difficult  to  formalize, 
there  exist  a  few  tasks  that  are  performed  by  the  whole  spectrum  of  vision  systems.  All  systems 
with  vision  move  in  their  environments.  As  they  move,  they  need  to  continuously  make  sense 
of  the  moving  images  they  receive  on  their  retinae  and  they  need  to  solve  problems  related  to 
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navigation;  in  particular,  they  need  to  know  how  they  themselves  are  moving  [1,  4,  20].  Inertial 
sensors  can  help  in  this  task,  but  it  is  vision  that  can  provide  accurate  answers.  Regardless 
of  the  way  in  which  a  system  moves  (walks,  crawls,  flies,  etc.),  its  eyes  move  rigidly.  This 
rigid  motion  can  be  described  by  a  translation  and  a  rotation;  knowing  how  a  system  moves 
amounts  to  knowing  the  parameters  describing  its  instantaneous  velocity.  This  is  not  to  say,  of 
course,  that  a  vision  system  has  an  explicit  representation  of  the  parameters  of  the  rigid  motion 
that  its  eyes  undergo.  This  knowledge  could  be  implicit  in  the  circuits  that  perform  speciflc 
tasks,  such  as  stabilization,  landing,  pursuit,  etc.  [9,  14,  26,  28],  but  successful  completion  of 
navigation-related  tasks  presupposes  some  knowledge  of  the  egomotion  parameters  or  subsets 
of  them.  Thus,  a  comparison  of  eyes  with  regard  to  egomotion  estimation  should  lead  to  a 
better  understanding  of  one  of  the  most  basic  visual  competences. 

Two  fundamentally  diflFerent  eye  designs  are  compared  here,  a  spherical  eye  and  a  planar, 
camera- type  eye  (Figure  2).  Spherical  eyes  model  the  compound  eyes  of  insects,  while  planar 
eyes  model  the  corneal  eyes  of  land  vertebrates  as  well  as  fish.  In  addition,  the  panoramic 
vision  of  some  organisms,  achieved  by  placing  camera-type  eyes  on  opposite  sides  of  the  head,  is 
approximated  weU  by  a  spherical  eye.  The  essential  dilference  between  a  spherical  and  a  planar 
eye  lies  in  the  field  of  view,  360  degrees  in  the  spherical  case  and  a  restricted  field  in  the  planar 
case.  The  comparison  performed  here  demonstrates  that  spherical  eyes  are  superior  to  planar 
eyes  for  3D  motion  estimation.  “Superior”  here  means  that  the  ambiguities  inherent  in  deriving 
3D  motion  from  planar  image  sequences  are  not  present  in  the  spherical  case.  Specifically,  a 
geometrical/statistical  analysis  is  conducted  to  investigate  the  functions  that  can  be  used  to 
estimate  3D  motion,  relating  2D  image  measurements  to  the  3D  scene.  These  functions  are 
expressed  in  terms  of  errors  in  the  3D  motion  parameters  and  they  can  be  understood  as  m^llti- 
dimensional  surfaces  in  those  parameters.  3D  motion  estimation  amounts  to  a  minimization 
problem;  thus,  our  approach  is  to  study  the  relationships  among  the  parameters  of  the  errors 
in  the  estimated  3D  motion  at  the  minima  of  the  surfaces,  because  these  locations  provide 
insight  into  the  behaviors  of  the  estimation  procedures.  It  is  shown  that,  at  the  locations  of 
the  minima,  the  errors  in  the  estimates  of  both  the  translation  and  rotation  are  non-zero  in 
the  planar  case,  while  in  the  spherical  case  either  the  translational  or  rotational  error  becomes 
zero.  Intuitively,  with  a  camera-type  eye  there  is  an  unavoidable  confusion  between  translation 
and  rotation,  as  well  as  between  translational  errors  and  the  actual  translation.  This  confusion 
does  not  occur  with  a  spherical  eye.  The  implication  is  that  visual  navigation  tasks  involving 
3D  motion  parameter  estimation  are  easier  to  solve  with  spherical  eyes  than  with  planar  eyes. 

The  basic  geometry  of  image  motion  is  well  understood.  As  a  system  moves  in  its  envi¬ 
ronment,  every  point  of  the  environment  has  a  velocity  vector  relative  to  the  system.  The 
projections  of  these  3D  velocity  vectors  on  the  retina  of  the  system’s  eye  constitutes  the  motion 
field.  For  an  eye  moving  with  translation  t  and  rotation  w  in  a  stationary  environment,  each 
scene  point  R  =  {X,Y,  Z)  measured  with  respect  to  a  coordinate  system  OXYZ  fixed  to  the 
nodal  point  of  the  eye  has  velocity  R  =  -t  -  w  x  R.  Projecting  R  onto  a  retina  of  a  given 
shape  gives  the  image  motion  field.  If  the  image  is  formed  on  a  plane  (Figure  2a)  orthogonal  to 
the  Z  axis  at  distance  /  (focal  length)  from  the  nodal  point,  then  an  image  point  r  =  (^x,y,  f) 
and  its  corresponding  scene  point  R  are  related  by  r  =  R,  where  zq  is  a  unit  vector  in  the 
direction  of  the  Z  axis.  The  motion  field  becomes 

^  "  ~^~'zo)^^°  X  (t  X  r))  +  i  zo  X  (r  X  (a;  X  r))  =  i  utr(t)  -F  Urot(a»),  (1) 

with  Z  =  R-zo  representing  the  depth.  If  the  image  is  formed  on  a  sphere  of  radius  /  (Figure  2b) 
having  the  center  of  projection  as  its  origin,  the  image  r  of  any  point  R  is  r  =  with  R 

|R|’ 
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Figure  2:  Image  formation  on  the  sphere  (a)  and  on  the  plane  (b).  The  system  moves  with  a 
rigid  motion  with  translational  velocity  t  and  rotational  velocity  a>.  Scene  points  R  project 
onto  image  points  r  and  the  3D  velocity  R  of  a  scene  point  is  observed  in  the  image  as  image 
velocity  r.  • 


being  the  norm  of  R  (the  range),  and  the  image  motion  is 

1  1 

^  ^  |R^  ((t  •  r)  r  -  t)  -  w  X  r  =  -  Utr(t)  +  Urot(«)-  (2) 

The  motion  field  is  the  sum  of  two  components,  one,  Utr,  due  to  translation  and  the  other, 
Urot,  due  to  rotation.  The  depth  Z  or  range  JE  of  a  scene  point  is  inversely  proportional  to  the 
trajislational  flow,  while  the  rotational  flow  is  independent  of  the  scene  in  view.  As  can  be  seen 
from  (1)  and  (2),  the  effects  of  translation  and  scene  depth  cannot  be  separated,  so  only  the 
direction  of  translation,  t/|t|,  can  be  computed.  We  can  thus  choose  the  length  oft;  throughout 
the  following  analysis  /  is  set  to  1,  and  the  length  of  t  is  assumed  to  be  1  on  the  sphere  and 
the  Z-component  of  t  to  be  1  on  the  plane.  The  problem  of  egomotion  then  amounts  to  finding 
the  scaled  vector  t  and  the  vector  a?  from  a  representation  of  the  motion  field. 

To  set  up  mathematical  formulations  for  3D  motion  estimation,  the  following  questions 
should  be  answered.  The  first  question  to  be  addressed  is,  what  description  containing  in¬ 
formation  about  3D  motion  does  a  system  use  to  represent  the  image  sequence?  One  might 
envision  a  sophisticated  system  that  could  attempt  to  estimate  the  motion  field,  termed  the 
optic  flow  field  [15].  On  the  other  hand,  it  is  also  easy  to  envision  a  system  that  does  not  have 
the  capacity  to  estimate  the  motion  field,  but  only  to  obtain  a  partial  description  of  it.  An 
example  of  a  description  containing  minimal  information  about  image  motion  is  the  normal 
motion  field.  This  amounts  to  the  projection  of  the  motion  field  onto  the  direction  of  the  image 
gradient  at  each  point,  and  represents  the  movement  of  each  local  edge  element  in  the  direction 
perpendicular  to  itself.  Normal  flow  can  be  estimated  from  local  spatiotemporal  information  in 
the  image  [22-24,  27].  If  n  is  a  unit  vector  at  an  image  point  denoting  the  orientation  of  the 
gradient  at  that  point,  the  normal  flow  satisfies 

Vn  =  r  •  n.  (3) 
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Unlike  normal  flow,  the  estimation  of  optic  flow  is  a  difficult  problem  because  information 
from  different  image  neighborhoods  must  be  compared  and  used  in  a  smoothing  scheme  to 
account  for  discontinuities  [10, 12].  Although  it  is  not  yet  known  exactly  what  kinds  of  image 
representations  different  visual  systems  recover,  it  is  clear  that  such  descriptions  should  lie 
somewhere  between  normal  flow  fields  and  optic  flow  fields.  Thus,  when  comparing  eye  designs 
with  regard  to  3D  motion  estimation,  one  must  consider  both  kinds  of  flow  fields. 

The  second  question  to  be  addressed  is,  through  what  geometric  laws  or  constraints  is  3D 
motion  coded  into  image  motion?  The  constraints  are  easily  observed  from  (1-3).  Equations  (1) 
and  (2)  show  how  the  motions  of  image  points  are  related  to  3D  rigid  motion  and  to  scene  depth. 
By  eliminating  depth  from  these  equations,  one  obtains  the  well  known  epipolar  constraint  [19]; 
for  both  planar  and  spherical  eyes  it  is 


(t  X  r)  •  (f  +  a?  X  r)  =  0. 


(4) 


Equating  image  motion  with  optic  flow,  this  constraint  allows  for  the  derivation  of  3D  rigid 
motion  on  the  basis  of  optic  flow  measurements.  One  is  interested  in  the  estimates  of  translation 
t  and  rotation  which  best  satisfy  the  epipolar  constraint  at  every  point  r  according  to  some 
criterion  of  deviation.  The  Euclidean  norm  is  usually  used,  leading  to  the  minimization  [11,  21] 
of  the  function^ 

Mep  =  jj  [(t  X  r)  •  (r  +  ti  X  r)]^dr.  (5) 

image 


On  the  other  hand,  if  normal  flow  is  given,  the  vector  equations  (1)  and  (2)  cannot  be  used 
directly.  The  only  constraint  is  scalar  equation  (3),  along  with  the  inequality  Z  >  0  which  states 
that  since  the  surface  in  view  is  in  front  of  the  eye  its  depth  must  be  positive.  Substituting  (1) 
or  (2)  into  (3)  and  solving  for  the  estimated  depth  Z  or  range  R,  we  obtain  for  a  given  estimate 
t,w  at  each  point  r: 


Z{otR)  = 


utr(t)-n 
(r-  Urot(w))  -n' 


(6) 


If  the  numerator  and  denominator  of  (6)  have  opposite  signs,  negative  depth  is  computed. 
Thus,  to  utilize  the  positivity  constraint  one  must  search  for  the  motion  t,«  that  produces 
a  minimum  number  of  negative  depth  estimates.  Formally,  if  r  is  an  image  point,  define  the 
indicator  function 


•Inci(r) 


1  for  ( Utr(t)  •  nUr  -  Urot(w))  <  0 
0  for  (utr(t)  •  n)  (f  -  UrotC^))  >  0 


Then  estimation  of  3D  motion  from  normal  flow  amounts  to  minimizing  [4,  5,  13]  the  function 


Mnd  =  J J  Ind{r)dT. 


image 


(7) 


Expressing  f  in  terms  of  the  real  motion  from  (1)  and  (2),  functions  (5)  and  (7)  can  be 
expressed  in  terms  of  the  actual  and  estimated  motion  parameters  t,  oj,  t  and  a>  (or,  equivalently, 
the  actual  motion  parameters  t,aj  and  the  errors  =  t  -  t,  =  a;  -  d>)  and  the  depth  Z 
(or  range  R)  of  the  viewed  scene.  To  conduct  any  analysis,  a  model  for  the  scene  is  needed. 
We  are  interested  in  the  statistically  expected  values  of  the  motion  estimates  resulting  from 

^  Because  t  x  r  introduces  the  sine  of  the  angle  between  t  and  r,  the  minimization  prefers  vectors  t  close  to 
the  center  of  gravity  of  the  points  r.  This  bias  has  been  recognized  [25]  and  alternatives  have  been  proposed 
that  reduce  this  bias,  but  without  eliminating  the  confusion  between  rotation  and  translation. 
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all  possible  scenes.  Thus,  as  our  probabilistic  model  we  assume  that  the  depth  values  of  the 
scene  are  uniformly  distributed  between  two  arbitrary  values  ZmiJoT  Rmm)  and  ZmaxCor  Umax) 
(0  <  Zmin  <  Zraax)-  For  the  minimization  of  negative  depth  values,  we  further  assume  that  the 
directions  in  which  flow  measurements  are  made  are  uniformly  distributed  in  every  direction 
for  every  depth.  Parameterizing  n  by  the  angle  between  n  and  the  x  axis,  we  thus  obtain 
the  following  two  functions: 

^  Zjmx 

E^=  J  MepdZ,  (8)  End=  J  j  MnddZd^,  (9) 

Z^ZniiTi  ^=0  Z=Ziaiji 

measuring  deviation  from  the  epipolar  constraint  and  the  amount  of  negative  depth,  respec¬ 
tively.  Functions  (8)  and  (9)  are  flve-dimensional  surfaces  in  tcOJ^,  the  errors  in  the  motion 
parameters. 

We  are  interested  in  the  topographic  structure  of  these  surfaces,  in  particular,  in  the  rela¬ 
tionships  among  the  errors  and  the  relationships  of  the  errors  to  the  actual  motion  parameters 
at  the  minima  of  the  functions.  The  idea  behind  this  is  that  in  practical  situations  any  estima¬ 
tion  procedure  is  hampered  by  errors  and  usually  local  minima  of  the  functions  to  be  minimized 
axe  found  as  solutions. 

Independent  of  the  particular  algorithm,  procedures  for  estimating  3D  motion  can  be  clas¬ 
sified  into  those  estimating  either  the  translation  or  rotation  as  a  first  step  and  the  remaining 
component  (that  is,  the  rotation  or  translation)  as  a  second  step,  and  those  estimating  all 
components  simultaneously.  Procedures  of  the  former  kind  result  when  systems  utilize  inertial 
sensors  which  provide  them  with  estimates  of  one  of  the  components,  or  when  two-step  motion 
estimation  algorithms  are  used. 

Thus,  three  cases  need  to  be  studied:  the  case  were  no  prior  information  about  3D  motion 
is  available  and  the  cases  where  an  estimate  of  translation  or  rotation  is  available  with  some 
error.  Imagine  that  somehow  the  rotation  has  been  estimated,  with  an  error  a>£.  Then  our 
functions  become  two-dimensional  in  the  variables  and  represent  the  space  of  translational 
error  parameters  corresponding  to  a  fixed  rotational  error.  Similarly,  given  a  translational 
error  tf,  the  functions  become  three-dimensional  in  the  variables  We  and  represent  the  space 
of  rotational  errors  corresponding  to  a  fixed  translational  error.  To  study  the  general  case, 
one  needs  to  consider  the  lowest  valleys  of  the  functions  in  2D  subspaces  which  pass  through 
0.  In  the  image  processing  literature,  such  local  minima  are  often  referred  to  as  ravine  lines 
or  courses.^  Eaxfli  of  the  three  cases  is  studied  for  four  optimizations:  epipolar  minimization 
for  the  sphere  and  the  plane  and  minimization  of  negative  depth  for  the  sphere  and  the  plane. 
Thus,  there  are  twelve  (four  times  three)  cases,  but  since  the  effects  of  rotation  on  the  image  are 
independent  of  depth,  it  makes  no  sense  to  perform  minimization  of  negative  depth  assuming  an 
estimate  of  translation  is  available.  Thus,  we  axe  left  with  ten  different  cases  which  are  studied 
below.  These  ten  cases  represent  all  the  possible,  meaningful  motion  estimation  procedures  on 
the  plane  and  sphere. 

*One  may  wish  to  study  the  problem  in  the  presence  of  noise  in  the  flow  measurements  and  derive  instead  the 
expected  values  of  the  local  and  global  minima.  It  has  been  shown,  however,  that  noise  which  is  of  no  particular 
bias  does  not  alter  the  local  minima,  and  the  global  minima  fall  within  the  valleys  of  the  function  without  noise. 
In  particular,  we  considered  in  [7]  noise  N  of  the  form  N  =  +  6,  with  e,  S  2D,  independent,  stochastic  error 

vectors.  As  such  noise  does  not  alter  the  function’s  overall  structure,  it  won’t  be  considered  here;  the  interested 
reader  is  referred  to  [7]. 
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Epipolar  Minimization  on  the  Plane  Denote  estimated  quantities  by  letters  with  hat 
signs,  actual  quantities  by  unmarked  letters,  and  the  differences  between  actual  and  estimated 
quantities  (the  errors)  by  the  subscript  “e.”  Furthermore,  let  t  =  (xo,yo,l)  and  w  =  (a,/?, 7). 
Since  the  field  of  view  is  small,  the  quadratic  terms  in  the  image  coordinates  are  very  small 
relative  to  the  linear  and  constant  terms,  and  are  therefore  ignored. 

Considering  a  circular  aperture  of  radius  e,  setting  the  focal  length  /  =  1,  W  =  1  and 
W  =  1,  the  function  in  (8)  becomes 


e  2ir 


J  J  j  -  Pc  +  IcV  +  {y  -  yo) 


Z=Ziam  r=0  i^=0 

'y-yo 


+  at-%x  +  y 


(x-xo)j  I 


dr  d(f>  dZ 


where  {r,(f))  are  polar  coordinates  (x  =  7-cos<^,  y  =  rsin^).  Performing  the  integration,  one 
obtains 

Eep  =  ({Zxa&x  —  -Zinin)  ^  (Te  (®0  +  ^o)  +  {^OO^c  +  yoPc)  +  «£  +  Pfj  ^  + 

(ioQ^e  +  yoPc)  ^  +  (In  (-Zniax)  ~  In  (Zmin))  ~  J/Oe^o)  +  Xq^/S^  —  + 

2  {xo,yo  -  yo,xo)  (xoa^  +  yoPc) )  + 


(a)  Assume  that  the  translation  has  been  estimated  with  a  certain  error  te  =  (xotiS/o^,  0). 
Then  the  relationship  among  the  errors  in  3D  motion  at  the  minima  of  (10)  is  obtained  from 
the  first-order  conditions  ^  ^  =  0,  which  yield 


_  S/0<  (In  (^max)  In  (^min))  n  _  ^Og  (^  (Zgiax)  In  (^min)) 

^  ^  ””  r7  rr 


^max  ^TL 


7c  =  0 


(11) 


It  follows  that  otc/ Pc  —  ~^0e/y0ej7c  =  0,  which  means  that  there  is  no  error  in  7  and  the 
projection  of  the  translational  error  on  the  image  is  perpendicular  to  the  projection  of  the 
rotational  error.  This  constraint  is  called  the  “orthogonality  constraint.” 


(b)  Assuming  that  rotation  has  been  estimated  with  an  error  the  relationship 

among  the  errors  is  obtained  from  =  0.  In  this  case,  the  relationship  is  very 

elaborate  and  the  translational  error  depends  on  ^  the  other  parameters — that  is,  the  rotational 
error,  the  actual  translation,  the  image  size  and  the  depth  interval. 


(c)  In  the  general  case,  we  need  to  study  the  subspaces  in  which  E^p  changes  least  at  its 
absolute  minimum)  that  is,  we  are  interested  in  the  direction  of  the  smallest  second  derivative 
at  0,  the  point  where  the  motion  errors  are  zero.  To  find  this  direction,  we  compute  the  Hessian 
at  0,  that  is  the  matrix  of  the  second  derivatives  of  Eep  with  respect  to  the  five  motion  error 
parameters,  and  compute  the  eigenvector  corresponding  to  the  smallest  eigenvalue.  The  scaled 
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components  of  this  vector  amount  to 


2^0e  =  ^0  2/0.  =  yo  A  =  -Oie^  7c  =  0 

Og  =  2yo^min^max  (^max)  ”*  (^min))  / 

^  (^max  “  ‘^min)  (^max^min  “"1) 

+  (  (^max  -  ^znin)"*  (^max^min  -  1)"*  +  ^ZL^Z^n  0^  (^max)  -  ln  (^min))'  ) 

As  can  be  seen,  for  points  defined  by  this  direction,  the  translational  and  rotational  errors 
are  characterized  by  the  orthogonality  constraint  =  —xo^/yo^  and  by  the  constraint 

®o/yo  =  io/yo]  that  is,  the  projection  of  the  actual  translation  and  the  projection  of  the 
estimated  translation  lie  on  a  line  passing  through  the  image  center.  We  refer  to  this  second 
constraint  as  the  “line  constraint.”  These  results  are  in  accordance  with  previous  studies 
[2,  21],  which  found  that  the  translational  components  along  the  x  and  y  axes  are  confused 
with  rotation  around  the  y  and  x  axes,  respectively,  and  the  “line  constraint”  under  a  set  of 
restrictive  assumptions. 


Epipolar  Minimization  on  the  Sphere  The  function  representing  deviation  from  the 
epipolar  constraint  on  the  sphere  takes  the  simple  form 

■Rmax  ,  ,  /  ^  K  V  *2 

■Rmin  sphere 


where  A  refers  to  a  surface  element.  Due  to  the  sphere’s  symmetry,  for  each  point  r  on 
the  sphere,  there  exists  a  point  with  coordinates  — r.  Since  utr(r)  =  utr(— r)  and  Urot(r)  = 
— Urot(~r),  when  the  integrand  is  expanded  the  product  terms  integrated  over  the  sphere  van¬ 
ish.  Thus 


+  X  r)  •  (t  X  r))' 


dAdR 


t  Sphere 


(a)  Assuming  that  translation  t  has  been  estimated,  the  a?£  that  minimizes  Egp  is  u?£  =  0, 
since  the  resulting  function  is  non-negative  quadratic  in  (minimum  at  zero).  The  difference 
between  sphere  and  plane  is  already  dear.  In  the  spherical  case,  as  shown  here,  if  an  error  in 
the  translation  is  made  we  do  not  need  to  compensate  for  it  by  making  an  error  in  the  rotation 
(aj£  =  0),  while  in  the  planar  case  we  need  to  compensate  to  ensure  that  the  orthogonality 
constraint  is  satisfied! 

(b)  Assuming  that  rotation  has  been  estimated  with  an  error  a;£,  what  is  the  translation  t 
that  minimizes  Eep?  Since  R  is  uniformly  distributed,  integrating  over  R  does  not  alter  the 
form  of  the  error  in  the  optimization.  Thus,  Eep  consists  of  the  sum  of  two  terms: 

K  =  Ki  J  J  X  t  j  dA  and  L  =  Li  J  J  ^(a>£  x  r)  •  x  r)  dA, 

sphere  sphere 

where  Ki,Li  are  multiplicative  factors  depending  only  on  Rroin  and  ilmax-  For  angles  between 
t,  t  and  t,aj£  in  the  range  of  0  to  x/2,  K  and  L  are  monotonic  functions.  K  attains  its  TniniTmim 
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when  t  =  t  and  L  when  t  ±  Consider  a  certain  distance  between  t  and  t  leading  to  a  certain 
value  K,  and  change  the  position  of  t.  i  takes  its  mininaum  when  (t  x  t)  •  =  0,  as  follows 

from  the  cosine  theorem.  Thus  achieves  its  minimum  when  t  lies  on  the  great  circle  passing 
through  t  and  with  the  exact  position  depending  on  |uj£|  and  the  scene  in  view. 

(c)  For  the  general  case  where  no  information  about  rotation  or  translation  is  available,  we 
study  the  subspaces  where  Eqj)  changes  the  least  at  its  absolute  minimum,  i.e.,  we  are  again 
interested  in  the  direction  of  the  smallest  second  derivative  at  0.  For  points  defined  by  this 
direction  we  calculate  t  =  t  and  oje  ±  t. 


To  study  the  negative  depth  values  described  by  function  (9)  a  more  geometric  interpretation 
is  needed.  Substituting  into  (6)  the  value  of  f  from  (1)  or  (2)  gives 


Z{pxk)  = 


Utr(t)  ’  n 

(^(orfl)  ~  l^rot  (<*’£)) 


n 


This  equation  shows  that  for  every  n  and  r  a  range  of  values  for  Z  (or  R)  is  obtained  which 
result  in  negative  estimates  of  Z  (or  K).  Thus  for  each  direction  n,  considering  all  image  points 
r,  we  obtain  a  volume  in  space  corresponding  to  negative  depth  estimates.  The  sum  of  all  these 
volumes  for  all  directions  is  termed  the  “negative  depth”  volume,  and  calculating  3D  motion  in 
this  case  amounts  to  minimizing  this  volume.  Minimization  of  this  volume  provides  conditions 
for  the  errors  in  the  motion  parameters. 


Minimizing  Negative  Depth  Volume  on  the  Plane  This  analysis  is  given  in  [6].  The 

findings  are  snmnaarized  here: 

(a)  Assume  that  rotation  has  been  estimated  with  an  error  (a£,/?«,7e).  Then  the  error 
(^OjjS/Oe)  that  minimizes  the  negative  depth  volume  satisfies  the  orthogonality  constraint 

(b)  In  the  absence  of  any  prior  information  about  the  3D  motion,  the  solution  obtained  by 
minimizing  the  negative  depth  volume  has  errors  that  satisfy  the  orthogonality  constraint 

=  -Pe/cxi,  the  line  constraint  xo/yo  =  xq/vo  and  7£  =  0 

Minimizing  Negative  Depth  Volume  on  the  Sphere 

(^)  Assuining  that  the  rotation  has  been  estimated  with  an  error  coj,  what  is  the  optimal 
translation  t  that  minimizes  the  negative  depth  volume? 

Since  the  motion  field  along  different  orientations  n  is  considered,  a  parameterization  is 
needed  to  express  all  possible  orientations  on  the  sphere.  This  is  achieved  by  selecting  an 
arbitrary  vector  s;  then,  at  each  point  r  of  the  sphere,  p^pji  defines  a  direction  in  the  tangent 
plane.  As  s  moves  along  half  a  circle,  p^J|j  takes  on  every  possible  orientation  (with  the 

exception  of  the  points  r  lying  on  the  great  circle  of  s).  Let  us  pick  perpendicular  to  s 
(s  •  We  =  0). 

We  are  interested  in  the  points  in  space  with  estimated  negative  range  values  R.  Since 
“  ~  Tisx%’  s  •  We  =  0,  the  estimated  range  R  amounts  to  J?  =  ^  <  0  if 

sgn[(t  X  s)  •  r]  =  -sgn[(t  x  s)  •  r  -  R{u^  •  r)(s  •  r)],  where  sgn(a:)  provides  the  sign  of  x.  This 
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area 

location 

constraint  on  R 

I 

sgn(t  X  s)  •  r  =  sgn(t  x  s)  •  r  =  sgn(r  •  «c)(r  •  s) 

p.  (txs).r 

(r.a,£)(r-s) 

II 

— sgn(t  X  s)  •  r  =  sgn(t  x  s)  •  r  =  sgn(r  •  uje)(r  •  s) 

all  |R| 

IV 

sgn(t  X  s)  •  r  =  sgn(t  x  s)  •  r  =  — sgn(r  •  W£)(r  •  s) 

none 

Figure  3:  Classification  of  image  points  according  to  constraints  on  R.  The  four  areas  are 
marked  by  different  colors.  The  textured  parts  (parallel  lines)  in  areas  I  and  III  denote  the 
image  points  for  which  negative  depth  values  exist  if  the  scene  is  bounded.  The  two  hemispheres 
correspond  to  the  front  of  the  sphere  and  the  back  of  the  sphere,  both  as  seen  from  the  front 
of  the  sphere. 


constraint  divides  the  surface  of  the  sphere  into  four  areas,  I  to  IV,  whose  locations  are  defined 
by  the  signs  of  the  functions  (t  x  s)  •  r,  (t  x  s)  •  r  and  (we  •  r)(s  •  r),  as  shown  in  Figure  3. 

For  any  direction  n  a  volume  of  negative  range  values  is  obtained  consisting  of  the  volumes 
above  areas  I,  II  and  III.  Areas  II  and  III  cover  the  same  amount  of  area  between  the  great 
circles  (t  x  s)  •  r  =  0  and  (t  x  s)  •  r  =  0,  and  area  I  covers  a  hemisphere  minus  the  area  between 
(t  X  s)  •  r  =  0  and  (t  x  s)  •  r  =  0.  If  the  scene  in  view  is  unbounded,  that  is,  R  €  [0,  +oo], 
there  is  for  every  r  a  range  of  values  above  areas  I  and  III  which  result  in  negative  depth 
estimates;  in  area  I  the  volume  at  each  point  r  is  bounded  from  below  by  i2  =  ; 

and  in  area  III  it  is  bounded  from  above  by  iZ  =  (ti*  ~  there  exist  lower  and  upper 
bounds  Rmir,  and  .Rmax  in  tke  scene,  we  obtain  two  additional  curves  Cmm  and  Cmax  with 
Crnin  —  (t  X  s)  •  T  ‘  n)(s  ■  r)  —  0  and  Cmax  —  (t  X  s)  ■  r  Rxjxaxi^^c  *  n)(s  •  r)  =  0,  and 

we  obtain  negative  depth  values  in  area  I  only  between  Cmax  and  (t  x  s)  •  r  =  0  and  in  area  HI 
only  between  Cmin  and  (wt  x  r)(s  x  r)  =  0.  We  are  given  and  t,  and  we  are  interested  in 
the  t  which  minimizes  the  negative  range  volume.  For  any  s  the  corresponding  negative  range 
volume  becomes  smallest  if  t  is  on  the  great  circle  through  t  and  s,  that  is,  (t  x  s)  •  t  =  0,  as 
wiU  be  shown  next. 

Let  us  consider  a  t  such  that  (t  x  s)  •  t  0  and  let  us  change  t  so  that  (t  x  s)  •  t  =  0.  As 
t  changes,  the  area  of  type  II  becomes  an  area  of  type  IV  and  the  area  of  type  III  becomes  an 
area  of  type  I.  The  negative  depth  volume  is  changed  as  follows:  It  is  decreased  by  the  spaces 
above  area  II  and  area  III,  and  it  is  increased  by  the  space  above  area  I  (which  changed  from 


type  III  to  type  I).  Clearly,  the  decrease  is  larger  than  the  increase,  which  implies  that  the 
smallest  volume  is  obtained  for  s,t,t  lying  on  a  great  circle.  Since  this  is  true  for  any  s,  the 
minimum  negative  depth  volume  is  attained  for  t  =  t.^ 


(b)  Next,  assume  that  no  prior  knowledge  about  the  3D  motion  is  available.  We  want  to 
know  for  which  configurations  of  t  and  a>£  the  negative  depth  values  change  the  least  in  the 
neighborhood  of  the  absolute  mimmum,  that  is,  at  te  =  a>£  =  0.  From  the  analysis  above,  it  is 
known  that  for  any  7^  0,  t  =  t.  Next,  we  show  that  is  indeed  different  from  zero:  Take 
t  7^  t  on  the  great  circle  of  s  and  let  We,  as  before,  be  perpendicular  to  s. 

Since  (t  x  s)  x  We  =  0,  the  curves  Cmax  and  Cmin  can  be  expressed  as  Cmax(imn)  =  ("«  • 

**)(  ~  where  sin  Z(t,  s)  denotes  the  angle  between  vectors  t  and  s.  These 

curves  consist  of  the  great  circle  •  r  =  0  and  the  circle  )  -  (s  •  r)  =  0  parallel  to 

the  great  circle  (s  •  r)  =  0  (see  Figure  4).  If  >  1,  this  circle  disappears. 


Figure  4:  Configuration  for  t  and  t  on  the  great  circle  of  s  and  a?£  perpendicular  to  s.  The 
textured  part  of  area  I  denotes  image  points  for  which  negative  depth  values  exist  if  the  scene 
is  bounded. 

Consider  next  two  flow  directions  defined  by  vectors  si  and  S2  with  (si  x  t)  =  -(s2  x  t) 
and  Si  between  t  and  t. 

For  every  point  ri  in  area  III  defined  by  si  there  exists  a  point  r2  in  area  I  defined  by  S2  such 
that  the  negative  estimated  ranges  above  ri  and  r2  add  up  to  jR^ax  -  -Rmin-  Thus  the  volume  of 
negative  range  obtained  from  si  and  S2  amounts  to  the  area  of  the  sphere  times  (.Rmax  ~  iZmin) 
(area  II  of  si  contributes  a  hemisphere;  area  III  of  Si  and  area  I  of  S2  together  contribute  a 
hemisphere).  The  total  negative  range  volume  can  be  decomposed  into  three  components:  a 
component  Vi  originating  from  the  set  of  s  between  t  and  t,  a  component  V2  originating  from 
the  set  of  s  symmetric  in  t  to  the  set  in  Vi,  and  a  component  corresponding  to  the  remaining 
s,  which  consists  of  range  values  above  areas  of  type  I  only.  If  for  all  s  in  Vs,  >  1  V^t 

®  A  word  of  caution  about  the  parameterization  used  for  directions  n  =  is  needed.  It  does  not  treat  all 

orientations  equally  (as  s  varies  along  a  great  circle  with  constant  speed,  s  x  r  accelerates  and  decelerates).  Thus 
to  obtain  a  uniform  distribution,  normalization  is  necessary.  The  normalization  factors,  however,  do  not  affect 
the  previous  proof,  due  to  symmetry. 
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becomes  zero.  Thus  for  all  Iwel  with  |a>£|  <  the  negative  range  volume  is  equally  large 

and  amounts  to  the  area  on  the  sphere  times  (ilmax  ~  -Rmm)  times  Z(t,t).  Unless  R^iax  =  oo, 
jwel  takes  on  values  different  from  zero. 

This  shows  that  for  any  te  ^  0,  there  exist  vectors  We  0  which  give  rise  to  the  same 
negative  depth  volume  as  We  =  0.  However,  for  any  such  a>£  0  this  volume  is  larger  than  the 

volume  obtained  by  setting  t£  =  0.  It  follows  that  t  =  t.  From  Figure  3,  it  can  furthermore  be 
deduced  that  for  a  given  a>£  the  negative  depth  volume,  which  for  t  =  t  only  lies  above  areas 
of  type  I,  decreases  as  t  moves  along  a  great  circle  away  from  a5£,  as  the  areas  between  Cmm 
and  Cmax  and  between  Cmin  and  (t  x  s)  •  r  =  0  decrease.  This  proves  that  in  addition  to  t  =  t, 
t  ±  «£. 

The  preceding  results  demonstrate  the  advantages  of  spherical  eyes  for  the  process  of  3D 
motion  estimation.  Table  1  lists  the  eight  out  of  ten  cases  which  lead  to  clearly  defined  error 
configurations.  It  shows  that  3D  motion  can  be  estimated  more  accurately  with  spherical  eyes. 
Depending  on  the  estimation  procedure  used — and  systems  might  use  different  procedures  for 
different  tasks — either  the  translation  or  the  rotation  can  be  estimated  very  accurately.  For 
planar  eyes,  this  is  not  the  case,  as  for  all  possible  procedures  there  exists  confusion  between 
the  translation  and  rotation.  The  error  configurations  also  allow  systems  with  inertial  sensors 
to  use  more  efficient  estimation  procedures.  H  a  system  utilizes  a  gyrosensor  which  provides  an 
approximate  estimate  of  its  rotation,  it  can  employ  a  simple  algorithm  based  on  the  negative 
depth  constraint  for  only  translational  motion  fields  to  derive  its  translation  and  obtain  a  very 
accurate  estimate.  Such  algorithms  are  much  easier  to  implement  than  algorithms  designed  for 
completely  unknown  rigid  motions,  as  they  amount  to  searches  in  2D  as  opposed  to  5D  spaces 
[4].  Similarly,  there  exist  computational  advantages  for  systems  with  translational  inertial 
sensors  in  estimating  the  remaining  unknown  rotation. 

In  nature,  systems  that  walk  and  perform  sophisticated  manipulation  have  camera-type 
eyes,  and  systems  that  fly  usually  have  panoramic  vision,  either  through  compound  eyes  or  a 
combination  of  camera-type  eyes.  The  obvious  explanation  for  this  difference  is  the  need  for 
a  larger  field  of  view  in  flying  species,  and  the  need  for  very  accurate  segmentation  and  shape 
estimation,  and  thus  high  resolution  in  a  limited  field  of  view,  for  land-walking  species.  As 
shown  in  this  paper,  the  geometry  of  the  sphere  also  provides  a  computational  advantage;  it 
allows  for  more  efficient  and  accurate  egomotion  estimation  (even  at  the  expense  of  trading 
off  resolution  in  some  systems,  for  example,  in  insects),  and  this  is  much  more  necessary  for 
systems  flying  and  thus  moving  with  all  six  degrees  of  freedom  than  for  systems  moving  with 
usually  limited  rigid  motion  on  surfaces. 

The  above  results  also  point  to  ways  of  constructing  new,  powerful  eyes  by  taking  advantage 
of  both  the  panoramic  vision  of  flying  systems  and  the  high-resolution  vision  of  primates.  An 
eye  like  the  one  in  Figure  5,  assembled  from  a  few  video  cameras  arranged  on  the  surface  of 
a  sphere,^  can  easily  estimate  3D  motion  since,  while  it  is  moving,  it  is  sampling  a  spherical 
motion  field!  Even  more  important  for  today’s  applications  is  the  reconstruction  of  the  shape 
of  an  object  or  scene  in  a  very  accurate  manner.  Accurate  shape  models  are  needed  in  many 
applications  dealing  with  visualization,  as  in  video  editing/manipulation  or  in  virtual  reality 
settings  [16, 17].  To  obtain  accurate  shape  reconstruction,  both  the  3D  transformation  relating 
two  views  and  the  2D  transformation  relating  two  images  are  needed  with  good  precision. 
Given  accurate  3D  motion  (t,a>)  and  image  motion  (r),  equations  (1-3)  can  be  used  in  a 
straightforward  manner  to  estimate  depth  (Z)  or  range  (R)  and  thus  object  shape.  An  eye 
like  the  one  in  Figure  5  not  only  has  panoramic  properties,  eliminating  the  rotation/translation 

^Like  a  compound  eye  with  video  cameras  replacing  ommatidia 
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Table  1:  Summary  of  results 


I  II 


Spherical  Eye 

Camera-type  Eye 

Epipolar  minimization, 
given  optic  flow 

(a)  Given  a  translational  er¬ 
ror  tj,  the  rotational  er¬ 
ror  Wc  =  0 

(b)  Without  any  prior  infor¬ 
mation,  te  =  0  and  J. 

t 

(a)  For  a  fixed  translational 
error  (a;oe,yoe),  the  ro¬ 
tational  error  (oe,  /?£, 

is  of  the  form  7£  =  0, 
«£//?£  =  -xojyo, 

(b)  Without  any  a  pri¬ 

ori  information  about 
the  motion,  the  er¬ 
rors  satisfy  7e  =  0, 

“cZ/Je  =  -Xojyo,, 
xo/yo  =  xojyo. 

Minimization  of  negative 
depth  volume,  given 
normal  flow 

(a)  Given  a  rotational  error 
ojj,  the  translational  er¬ 
ror  te  =  0 

(b)  Without  any  prior  infor¬ 
mation,  te  =  0  and  4»>e  -L 

t 

(a)  Given  a  rotational  error, 
the  translational  error  is 
of  the  form  —x^Jyo^  = 

(b)  Without  any  error 

information,  the  er¬ 
rors  satisfy  7£  =  0, 

(xJI5^  =  -»Oe/yOe, 

xo/yo  =  xojyo. 

confusion,  but  it  has  the  unexpected  benefit  of  making  it  easy  to  estimate  image  motion  with 
high  accuracy.  Any  two  cameras  with  overlapping  fields  of  view  also  provide  high-resolution 
stereo  vision,  and  this  collection  of  stereo  systems  makes  it  possible  to  locate  a  large  number 
of  depth  discontinuities.  Given  scene  discontinuities,  image  motion  can  be  estimated  very 
accurately  [8] .  As  a  consequence,  the  eye  in  Figure  5  is  very  well  suited  to  developing  accurate 
models  of  the  world,  and  many  experiments  have  confirmed  this  finding.  However,  such  an  eye, 
although  appropriate  for  a  moving  robotic  system,  may  be  impractical  to  use  in  a  laboratory. 
Fortuitously  from  a  mathematical  viewpoint,  it  makes  no  difference  whether  the  cameras  are 
looking  inward  or  outward!  Consider,  then,  a  “negative”  spherical  eye  like  the  one  in  Figure 
6,  where  video  cameras  are  arranged  on  the  surface  of  a  sphere  pointing  toward  its  center. 
Imaging  a  moving  rigid  object  at  the  center  of  the  sphere  creates  image  motion  fields  at  the 
center  of  each  camera  which  are  the  same  as  the  ones  that  would  be  created  if  the  whole 
spherical  dome  were  moving  with  the  opposite  rigid  motion!  Thus,  utilizing  information  from 
all  the  cameras,  the  3D  motion  of  the  object  inside  the  sphere  can  be  accurately  estimated, 
and  at  the  same  time  accurate  shape  models  can  be  obtained  from  the  motion  field  of  each 
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camera.  The  negative  spherical  eye  also  allows  for  accurate  recovery  of  models  of  action,  such 
as  human  movement,  because  putting  together  motion  and  shape,  sequences  of  3D  motion  fields 
representing  the  motion  inside  the  dome  can  be  estimated.  Such  action  models  will  find  many 
applications  in  telereality,  graphics  and  recognition.  The  above  described  configurations  are 
examples  of  alternative  sensors,  and  they  also  demonstrate  that  multiple- view  vision  has  great 
potential.  Different  arrangements  best  suited  for  other  problems  can  be  imagined.  This  was 
perhaps  foreseen  in  ancient  Greek  mythology,  which  has  Argus,  the  hundred-eyed  guardian  of 
Hera,  the  goddess  of  Olympus,  defeating  a  whole  army  of  Cyclopes,  one-eyed  giants! 


Figure  5:  A  compound-like  eye  composed  of  conventional  video  cameras,  arranged  on  a  sphere 
and  looking  outward. 
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